Off-topic essay detection using short prompt texts
نویسندگان
چکیده
Our work addresses the problem of predicting whether an essay is off-topic to a given prompt or question without any previouslyseen essays as training data. Prior work has used similarity between essay vocabulary and prompt words to estimate the degree of ontopic content. In our corpus of opinion essays, prompts are very short, and using similarity with such prompts to detect off-topic essays yields error rates of about 10%. We propose two methods to enable better comparison of prompt and essay text. We automatically expand short prompts before comparison, with words likely to appear in an essay to that prompt. We also apply spelling correction to the essay texts. Both methods reduce the error rates during off-topic essay detection and turn out to be complementary, leading to even better performance when used in unison.
منابع مشابه
Detecting Off-topic Responses to Visual Prompts
Automated methods for essay scoring have made great progress in recent years, achieving accuracies very close to human annotators. However, a known weakness of such automated scorers is not taking into account the semantic relevance of the submitted text. While there is existing work on detecting answer relevance given a textual prompt, very little previous research has been done to incorporate...
متن کاملAdvanced Capabilities for Evaluating Student Writing: Detecting Off-Topic Essays Without Topic-Specific Training
We have developed a method to identify when a student essay is off-topic, i.e. the essay does not respond to the test question topic. This task is motivated by a real-world problem: detecting when students using a commercial essay evaluation system, Criterion, enter off-topic essays. Sometimes this is done in bad faith to trick the system; other times it is inadvertent, and the student has cut-...
متن کاملIntensity of Relationship Between Words: Using Word Triangles in Topic Discovery for Short Texts
Uncovering latent topics from given texts is an important task to help people understand excess heavy information. This has caused the hot study on topic model. However, the main texts available daily are short, thus traditional topic models may not perform well because of data sparsity. Popular models for short texts concentrate on word co-occurrence patterns in the corpus. However, they do no...
متن کاملUnsupervised Topic Modeling for Short Texts Using Distributed Representations of Words
We present an unsupervised topic model for short texts that performs soft clustering over distributed representations of words. We model the low-dimensional semantic vector space represented by the dense distributed representations of words using Gaussian mixture models (GMMs) whose components capture the notion of latent topics. While conventional topic modeling schemes such as probabilistic l...
متن کاملIdentifying off-topic student essays without topic-specific training data
Educational assessment applications, as well as other natural-language interfaces, need some mechanism for validating user responses. If the input provided to the system is infelicitous or uncooperative, the proper response may be to simply reject it, to route it to a bin for special processing, or to ask the user to modify the input. If problematic user input is instead handled as if it were t...
متن کامل